-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch tpu calls in send-transaction-service #24083
Conversation
6fc5fd1
to
2f9d23b
Compare
Codecov Report
@@ Coverage Diff @@
## master #24083 +/- ##
=========================================
- Coverage 82.0% 81.9% -0.1%
=========================================
Files 593 593
Lines 163827 163956 +129
=========================================
+ Hits 134340 134399 +59
- Misses 29487 29557 +70 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change set can be much cleaner if we don't special-case batching and instead just batch everything with default batch size of one. Thoughts @CriesofCarrots ?
The logics specially added for batching has to be there -- unless you are questioning the logic itself. Using batch size = 1 which is actually disabling batching does not change that fact. |
That's exactly what I'm suggesting. All RPC STS sends should use batching, the default is just size = 1. That is, we drop the old logic entirely in favor of the new batching logic. It will be much cleaner and less bug-prone than this shoehorn implementation |
I don't think it makes sense to drop the old logic all together in case batch size = 1 -- for example firing right away without waiting for further entries. In my opinion, keep that logic is important till we get the confidence of the batch change. |
Maybe we can get rid of the send_transaction being called multiple places. |
5d03a97
to
c05429c
Compare
Fyi, I should have a bench-tps that targets Rpc/STS tomorrow-ish. I have tested rates yet, but hopefully it will be enough TPS to test the pathways you want here. |
Thanks @CriesofCarrots. It would be great if you can also examine this logic as the change is relatively large in a high traffic path. |
58a0cf4
to
a8760a6
Compare
let me know when this is ready for another pass |
Thanks Trent -- the code itself can be reviewed -- it handle the case we are doing async in none batch mode. Right now I am working on setting up a cluster to validate if the batching is making a difference. Will post some results. |
03bc1bd
to
d3932a2
Compare
Investigating the local-cluster test failure -- seems to be related to the changes in this PR... |
b6ca109
to
d9f4a29
Compare
Hi Tyera -- will you be able to give it one more pass? Thanks @t-nelson for all the great feedback! |
Yep, I'm about halfway through. I have to step afk for a bit, but it will be done tonight! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think all I have are nits!
I saw a few of the things I see you and Trent already talked about, so looking forward to the iterations.
However, one question just to make sure I understand: the batch_send_rate_ms only affects the initial send, right? After that, batches are sent on the retry_rate_ms schedule?
@@ -61,6 +61,15 @@ while [[ -n $1 ]]; do | |||
elif [[ $1 = --enable-rpc-bigtable-ledger-storage ]]; then | |||
args+=("$1") | |||
shift | |||
elif [[ $1 = --tpu-use-quic ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for plumbing these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct @CriesofCarrots. This is a heuristic controls to avoid extreme configurations of the two parameters. It is not a full rate limiting implementation. That will require another PR to address.
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes # (cherry picked from commit 7c61e43) # Conflicts: # validator/src/main.rs
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Introduced flag --tpu-do-batch2. Introduced flag to control the batch size-- by default 100 The default batch timeout is 200ms -- configurable. If either it time out or the batch size is filled, a new batch is sent The batch honor the retry rate on the transaction already sent before. Introduced two threads in STS: one for receiving new transactions and doing batch send and one for retrying old transactions and doing batch.6. Fixes #
Problem
Send transaction in batches to improve throughput
Summary of Changes
Fixes #